AITopics | receipt image

Collaborating Authors

receipt image

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese Receipts

Le, Thanh-Phong, Phan, Trung Le Chi, Nguyen, Nghia Hieu, Van Nguyen, Kiet

arXiv.org Artificial IntelligenceMar-7-2025

Document Visual Question Answering (Document VQA) challenges multimodal systems to holistically handle textual, layout, and visual modalities to provide appropriate answers. Document VQA has gained popularity in recent years due to the increasing amount of documents and the high demand for digitization. Nonetheless, most of document VQA datasets are developed in high-resource languages such as English. In this paper, we present ReceiptVQA (\textbf{Receipt} \textbf{V}isual \textbf{Q}uestion \textbf{A}nswering), the initial large-scale document VQA dataset in Vietnamese dedicated to receipts, a document kind with high commercial potentials. The dataset encompasses \textbf{9,000+} receipt images and \textbf{60,000+} manually annotated question-answer pairs. In addition to our study, we introduce LiGT (\textbf{L}ayout-\textbf{i}nfused \textbf{G}enerative \textbf{T}ransformer), a layout-aware encoder-decoder architecture designed to leverage embedding layers of language models to operate layout embeddings, minimizing the use of additional neural modules. Experiments on ReceiptVQA show that our architecture yielded promising performance, achieving competitive results compared with outstanding baselines. Furthermore, throughout analyzing experimental results, we found evident patterns that employing encoder-only model architectures has considerable disadvantages in comparison to architectures that can generate answers. We also observed that it is necessary to combine multiple modalities to tackle our dataset, despite the critical role of semantic understanding from language models. We hope that our work will encourage and facilitate future development in Vietnamese document VQA, contributing to a diverse multimodal research community in the Vietnamese language.

architecture, dataset, proceedings, (15 more...)

arXiv.org Artificial Intelligence

2502.19202

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
South America > Colombia > Meta Department > Villavicencio (0.04)
North America > United States > New York > New York County > New York City (0.04)
(7 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.71)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.46)
(2 more...)

Add feedback

CORU: Comprehensive Post-OCR Parsing and Receipt Understanding Dataset

Abdallah, Abdelrahman, Abdalla, Mahmoud, Kasem, Mahmoud SalahEldin, Mahmoud, Mohamed, Abdelhalim, Ibrahim, Elkasaby, Mohamed, ElBendary, Yasser, Jatowt, Adam

arXiv.org Artificial IntelligenceJun-6-2024

In the fields of Optical Character Recognition (OCR) and Natural Language Processing (NLP), integrating multilingual capabilities remains a critical challenge, especially when considering languages with complex scripts such as Arabic. This paper introduces the Comprehensive Post-OCR Parsing and Receipt Understanding Dataset (CORU), a novel dataset specifically designed to enhance OCR and information extraction from receipts in multilingual contexts involving Arabic and English. CORU consists of over 20,000 annotated receipts from diverse retail settings, including supermarkets and clothing stores, alongside 30,000 annotated images for OCR that were utilized to recognize each detected line, and 10,000 items annotated for detailed information extraction. These annotations capture essential details such as merchant names, item descriptions, total prices, receipt numbers, and dates. They are structured to support three primary computational tasks: object detection, OCR, and information extraction. We establish the baseline performance for a range of models on CORU to evaluate the effectiveness of traditional methods, like Tesseract OCR, and more advanced neural network-based approaches. These baselines are crucial for processing the complex and noisy document layouts typical of real-world receipts and for advancing the state of automated multilingual document processing. Our datasets are publicly accessible (https://github.com/Update-For-Integrated-Business-AI/CORU).

dataset, information extraction, receipt, (11 more...)

arXiv.org Artificial Intelligence

2406.04493

Country: Europe > Austria > Tyrol > Innsbruck (0.04)

Genre: Research Report (0.50)

Industry:

Retail (0.86)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.87)

Add feedback

Extending TrOCR for Text Localization-Free OCR of Full-Page Scanned Receipt Images

Zhang, Hongkuan, Whittaker, Edward, Kitagishi, Ikuo

arXiv.org Artificial IntelligenceOct-16-2023

Digitization of scanned receipts aims to extract text from receipt images and save it into structured documents. This is usually split into two sub-tasks: text localization and optical character recognition (OCR). Most existing OCR models only focus on the cropped text instance images, which require the bounding box information provided by a text region detection model. Introducing an additional detector to identify the text instance images in advance adds complexity, however instance-level OCR models have very low accuracy when processing the whole image for the document-level OCR, such as receipt images containing multiple text lines arranged in various layouts. To this end, we propose a localization-free document-level OCR model for transcribing all the characters in a receipt image into an ordered sequence end-to-end. Specifically, we finetune the pretrained instance-level model TrOCR with randomly cropped image chunks, and gradually increase the image chunk size to generalize the recognition ability from instance images to full-page images. In our experiments on the SROIE receipt OCR dataset, the model finetuned with our strategy achieved 64.4 F1-score and a 22.8% character error rate (CER), respectively, which outperforms the baseline results with 48.5 F1-score and 50.6% CER. The best model, which splits the full image into 15 equally sized chunks, gives 87.8 F1-score and 4.98% CER with minimal additional pre or post-processing of the output. Moreover, the characters in the generated document-level sequences are arranged in the reading order, which is practical for real-world applications.

receipt image, recognition, total, (16 more...)

arXiv.org Artificial Intelligence

2212.05525

Country: Asia (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Automatic Detection and Rectification of Paper Receipts on Smartphones

Whittaker, Edward, Tanaka, Masashi, Kitagishi, Ikuo

arXiv.org Artificial IntelligenceMar-10-2023

We describe the development of a real-time smartphone app that allows the user to digitize paper receipts in a novel way by "waving" their phone over the receipts and letting the app automatically detect and rectify the receipts for subsequent text recognition. We show that traditional computer vision algorithms for edge and corner detection do not robustly detect the non-linear and discontinuous edges and corners of a typical paper receipt in real-world settings. This is particularly the case when the colors of the receipt and background are similar, or where other interfering rectangular objects are present. Inaccurate detection of a receipt's corner positions then results in distorted images when using an affine projective transformation to rectify the perspective. We propose an innovative solution to receipt corner detection by treating each of the four corners as a unique "object", and training a Single Shot Detection MobileNet object detection model. We use a small amount of real data and a large amount of automatically generated synthetic data that is designed to be similar to real-world imaging scenarios. We show that our proposed method robustly detects the four corners of a receipt, giving a receipt detection accuracy of 85.3% on real-world data, compared to only 36.9% with a traditional edge detection-based approach. Our method works even when the color of the receipt is virtually indistinguishable from the background. Moreover, our method is trained to detect only the corners of the central target receipt and implicitly learns to ignore other receipts, and other rectangular objects. Including synthetic data allows us to train an even better model. These factors are a major advantage over traditional edge detection-based approaches, allowing us to deliver a much better experience to the user.

artificial intelligence, machine learning, pattern recognition, (16 more...)

arXiv.org Artificial Intelligence

2303.05763

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.48)

Add feedback

ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction

Huang, Zheng, Chen, Kai, He, Jianhua, Bai, Xiang, Karatzas, Dimosthenis, Lu, Shjian, Jawahar, C. V.

arXiv.org Artificial IntelligenceMar-18-2021

Scanned receipts OCR and key information extraction (SROIE) represent the processeses of recognizing text from scanned receipts and extracting key texts from them and save the extracted tests to structured documents. SROIE plays critical roles for many document analysis applications and holds great commercial potentials, but very little research works and advances have been published in this area. In recognition of the technical challenges, importance and huge commercial potentials of SROIE, we organized the ICDAR 2019 competition on SROIE. In this competition, we set up three tasks, namely, Scanned Receipt Text Localisation (Task 1), Scanned Receipt OCR (Task 2) and Key Information Extraction from Scanned Receipts (Task 3). A new dataset with 1000 whole scanned receipt images and annotations is created for the competition. In this report we will presents the motivation, competition datasets, task definition, evaluation protocol, submission statistics, performance of submitted methods and results analysis.

competition, dataset, information extraction, (11 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICDAR.2019.00244

2103.10213

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > Singapore (0.04)
Asia > India > Telangana > Hyderabad (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.96)
Information Technology > Data Science > Data Mining > Text Mining (0.86)

Add feedback